Target-specific stance detection on social media, which aims at classifying a textual data instance such as a post or a comment into a stance class of a target issue, has become an emerging opinion mining paradigm of importance. An example application would be to overcome vaccine hesitancy in combating the coronavirus pandemic. However, existing stance detection strategies rely merely on the individual instances which cannot always capture the expressed stance of a given target. In response, we address a new task called conversational stance detection which is to infer the stance towards a given target (e.g., COVID-19 vaccination) when given a data instance and its corresponding conversation thread. To tackle the task, we first propose a benchmarking conversational stance detection (CSD) dataset with annotations of stances and the structures of conversation threads among the instances based on six major social media platforms in Hong Kong. To infer the desired stances from both data instances and conversation threads, we propose a model called Branch-BERT that incorporates contextual information in conversation threads. Extensive experiments on our CSD dataset show that our proposed model outperforms all the baseline models that do not make use of contextual information. Specifically, it improves the F1 score by 10.3% compared with the state-of-the-art method in the SemEval-2016 Task 6 competition. This shows the potential of incorporating rich contextual information on detecting target-specific stances on social media platforms and implies a more practical way to construct future stance detection tasks.
translated by 谷歌翻译
尽管在预验证的GAN模型的潜在空间中表现出的编辑能力,但倒置现实世界的图像被陷入困境,即重建不能忠于原始输入。这样做的主要原因是,训练和现实世界数据之间的分布未对准,因此,对于真实图像编辑而言,它不稳定。在本文中,我们提出了一个基于GAN的新型编辑框架,以通过组成分解范式解决室外反转问题。特别是,在构图阶段,我们引入了一个差分激活模块,用于从全局角度\ ie(IE)检测语义变化,这是编辑和未编辑图像的特征之间的相对差距。借助生成的diff-cam掩模,配对的原始图像和编辑图像可以直观地进行粗糙的重建。这样,几乎整体可以生存属性,而这种中间结果的质量仍然受到不可避免的幽灵效果的限制。因此,在分解阶段,我们进一步提出了一个基于GAN的基于GAN的DEGHOSTING网络,用于将最终的精细编辑图像与粗糙重建分开。在定性和定量评估方面,广泛的实验比最新方法具有优势。我们方法的鲁棒性和灵活性在两个属性和多属性操作的方案上也得到了验证。
translated by 谷歌翻译
图形神经网络(GNN)在许多基于图的学​​习任务中表现出很大的优势,但通常无法准确预测基于任务的节点集,例如链接/主题预测等。最近,许多作品通过使用随机节点功能或节点距离特征来解决此问题。但是,它们的收敛速度缓慢,预测不准确或高复杂性。在这项工作中,我们重新访问允许使用位置编码(PE)技术(例如Laplacian eigenmap,deepwalk等)的节点的位置特征。 。在这里,我们以原则性的方式研究了这些问题,并提出了一种可证明的解决方案,这是一类用严格数学分析的钉子的GNN层。 PEG使用单独的频道来更新原始节点功能和位置功能。 PEG施加置换量比W.R.T.原始节点功能并施加$ O(P)$(正交组)均值W.R.T.位置特征同时特征,其中$ p $是二手位置特征的维度。在8个现实世界网络上进行的广泛链接预测实验证明了PEG在概括和可伸缩性方面的优势。
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
Interview has been regarded as one of the most crucial step for recruitment. To fully prepare for the interview with the recruiters, job seekers usually practice with mock interviews between each other. However, such a mock interview with peers is generally far away from the real interview experience: the mock interviewers are not guaranteed to be professional and are not likely to behave like a real interviewer. Due to the rapid growth of online recruitment in recent years, recruiters tend to have online interviews, which makes it possible to collect real interview data from real interviewers. In this paper, we propose a novel application named EZInterviewer, which aims to learn from the online interview data and provides mock interview services to the job seekers. The task is challenging in two ways: (1) the interview data are now available but still of low-resource; (2) to generate meaningful and relevant interview dialogs requires thorough understanding of both resumes and job descriptions. To address the low-resource challenge, EZInterviewer is trained on a very small set of interview dialogs. The key idea is to reduce the number of parameters that rely on interview dialogs by disentangling the knowledge selector and dialog generator so that most parameters can be trained with ungrounded dialogs as well as the resume data that are not low-resource. Evaluation results on a real-world job interview dialog dataset indicate that we achieve promising results to generate mock interviews. With the help of EZInterviewer, we hope to make mock interview practice become easier for job seekers.
translated by 谷歌翻译
Dynamic treatment regimes assign personalized treatments to patients sequentially over time based on their baseline information and time-varying covariates. In mobile health applications, these covariates are typically collected at different frequencies over a long time horizon. In this paper, we propose a deep spectral Q-learning algorithm, which integrates principal component analysis (PCA) with deep Q-learning to handle the mixed frequency data. In theory, we prove that the mean return under the estimated optimal policy converges to that under the optimal one and establish its rate of convergence. The usefulness of our proposal is further illustrated via simulations and an application to a diabetes dataset.
translated by 谷歌翻译
Temporal sentence grounding (TSG) aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query. All existing works first utilize a sparse sampling strategy to extract a fixed number of video frames and then conduct multi-modal interactions with query sentence for reasoning. However, we argue that these methods have overlooked two indispensable issues: 1) Boundary-bias: The annotated target segment generally refers to two specific frames as corresponding start and end timestamps. The video downsampling process may lose these two frames and take the adjacent irrelevant frames as new boundaries. 2) Reasoning-bias: Such incorrect new boundary frames also lead to the reasoning bias during frame-query interaction, reducing the generalization ability of model. To alleviate above limitations, in this paper, we propose a novel Siamese Sampling and Reasoning Network (SSRN) for TSG, which introduces a siamese sampling mechanism to generate additional contextual frames to enrich and refine the new boundaries. Specifically, a reasoning strategy is developed to learn the inter-relationship among these frames and generate soft labels on boundaries for more accurate frame-query reasoning. Such mechanism is also able to supplement the absent consecutive visual semantics to the sampled sparse frames for fine-grained activity understanding. Extensive experiments demonstrate the effectiveness of SSRN on three challenging datasets.
translated by 谷歌翻译
Human parsing aims to partition humans in image or video into multiple pixel-level semantic parts. In the last decade, it has gained significantly increased interest in the computer vision community and has been utilized in a broad range of practical applications, from security monitoring, to social media, to visual special effects, just to name a few. Although deep learning-based human parsing solutions have made remarkable achievements, many important concepts, existing challenges, and potential research directions are still confusing. In this survey, we comprehensively review three core sub-tasks: single human parsing, multiple human parsing, and video human parsing, by introducing their respective task settings, background concepts, relevant problems and applications, representative literature, and datasets. We also present quantitative performance comparisons of the reviewed methods on benchmark datasets. Additionally, to promote sustainable development of the community, we put forward a transformer-based human parsing framework, providing a high-performance baseline for follow-up research through universal, concise, and extensible solutions. Finally, we point out a set of under-investigated open issues in this field and suggest new directions for future study. We also provide a regularly updated project page, to continuously track recent developments in this fast-advancing field: https://github.com/soeaver/awesome-human-parsing.
translated by 谷歌翻译
A storyboard is a roadmap for video creation which consists of shot-by-shot images to visualize key plots in a text synopsis. Creating video storyboards however remains challenging which not only requires association between high-level texts and images, but also demands for long-term reasoning to make transitions smooth across shots. In this paper, we propose a new task called Text synopsis to Video Storyboard (TeViS) which aims to retrieve an ordered sequence of images to visualize the text synopsis. We construct a MovieNet-TeViS benchmark based on the public MovieNet dataset. It contains 10K text synopses each paired with keyframes that are manually selected from corresponding movies by considering both relevance and cinematic coherence. We also present an encoder-decoder baseline for the task. The model uses a pretrained vision-and-language model to improve high-level text-image matching. To improve coherence in long-term shots, we further propose to pre-train the decoder on large-scale movie frames without text. Experimental results demonstrate that our proposed model significantly outperforms other models to create text-relevant and coherent storyboards. Nevertheless, there is still a large gap compared to human performance suggesting room for promising future work.
translated by 谷歌翻译
In the new era of personalization, learning the heterogeneous treatment effect (HTE) becomes an inevitable trend with numerous applications. Yet, most existing HTE estimation methods focus on independently and identically distributed observations and cannot handle the non-stationarity and temporal dependency in the common panel data setting. The treatment evaluators developed for panel data, on the other hand, typically ignore the individualized information. To fill the gap, in this paper, we initialize the study of HTE estimation in panel data. Under different assumptions for HTE identifiability, we propose the corresponding heterogeneous one-side and two-side synthetic learner, namely H1SL and H2SL, by leveraging the state-of-the-art HTE estimator for non-panel data and generalizing the synthetic control method that allows flexible data generating process. We establish the convergence rates of the proposed estimators. The superior performance of the proposed methods over existing ones is demonstrated by extensive numerical studies.
translated by 谷歌翻译